|
In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). These algorithms have limited memory available to them (much less than the input size) and also limited processing time per item. These constraints may mean that an algorithm produces an approximate answer based on a summary or "sketch" of the data stream in memory. ==History== An early theoretical foundation of streaming algorithms for data mining, pattern discovery and machine learning was developed in 1990 by a group at Johns Hopkins University. The theoretical model produces trade-offs between available memory and the number of passes through a stream of data in the form of labelled samples. The paper by Heath, Kasif, Kosaraju and Salzberg and Sullivan was published in AAAI 1991 and proved lower and upper bounds for segmenting two class samples in one dimension, and extensions to learning classical "concepts" such as hyper-rectangles (bumps in statistics) and decision trees in high dimensions. The work was a chapter in David's Heath PhD thesis at Johns Hopkins University. Though streaming algorithms had already been studied by Munro and Paterson as well as Flajolet and Martin,〔 the field of streaming algorithms was first formalized and popularized in a paper by Noga Alon, Yossi Matias, and Mario Szegedy. For this paper, the authors later won the Gödel Prize in 2005 "for their foundational contribution to streaming algorithms." There has since been a large body of work centered around data streaming algorithms that spans a diverse spectrum of computer science fields such as theory, databases, networking, and natural language processing. Semi-streaming algorithms were introduced in 2005 as an extension of streaming algorithms that allows for a constant or logarithmic number of passes over the dataset (). 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Streaming algorithm」の詳細全文を読む スポンサード リンク
|